Visual and Audio Aware Bi-Modal Video Emotion Recognition

نویسندگان

  • Siqi Xiang
  • Wenge Rong
  • Zhang Xiong
  • Min Gao
  • Qingyu Xiong
چکیده

With rapid increase in the size of videos online, analysis and prediction of affective impact that video content will have on viewers has attracted much attention in the community. To solve this challenge several different kinds of information about video clips are exploited. Traditional methods normally focused on single modality, either audio or visual. Later on some researchers tried to establish multi-modal schemes and spend a lot of time choosing and extracting features by different fusion strategy. In this research, we proposed an end-toend model which can automatically extract features and target an emotional classification task by integrating audio and visual features together and also adding the temporal characteristics of the video. The experimental study on commonly used MediaEval 2015 Affective Impact of Movies has shown this method’s potential and it is expected that this work could provide some insight for future video emotion recognition from feature fusion perspective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatiotemporal Networks for Video Emotion Recognition

Our article presents an audio-visual based multi-modal emotion classification system. Considering the fact of deep learning approaches to facial analysis have recently demonstrated high performance, in our work, we use convolutional neural networks (CNNs) for emotion recognition in video, relying on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial ...

متن کامل

Towards Efficient Multi-Modal Emotion Recognition

The paper presents a multi‐modal emotion recognition system exploiting audio and video (i.e., facial expression) information. The system first processes both sources of information individually to produce corresponding matching scores and then combines the computed matching scores to obtain a classification decision. For the video part of the system, a novel ...

متن کامل

Bi-modal emotion recognition from expressive face and body gestures

Psychological research findings suggest that humans rely on the combined visual channels of face and body more than any other channel when they make judgments about human communicative behavior. However, most of the existing systems attempting to analyze the human nonverbal behavior are mono-modal and focus only on the face. Research that aims to integrate gestures as an expression mean has onl...

متن کامل

An Audio-Visual Approach to Music Genre Classification through Affective Color Features

This paper presents a study on classifying music by affective visual information extracted from music videos. The proposed audio-visual approach analyzes genre specific utilization of color. A comprehensive set of color specific image processing features used for affect and emotion recognition derived from psychological experiments or art-theory is evaluated in the visual and multi-modal domain...

متن کامل

Music Emotion Recognition from Lyrics: A Comparative Study

We present a study on music emotion recognition from lyrics. We start from a dataset of 764 samples (audio+lyrics) and perform feature extraction using several natural language processing techniques. Our goal is to build classifiers for the different datasets, comparing different algorithms and using feature selection. The best results (44.2% F-measure) were attained with SVMs. We also perform ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017